AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Proximal Policy Optimization

# Proximal Policy Optimization

Stable Vicuna 13b Delta
StableVicuna-13B is a fine-tuned version of the Vicuna-13B v0 model, enhanced through Reinforcement Learning from Human Feedback (RLHF) and Proximal Policy Optimization (PPO) on various dialogue and instruction datasets.
Large Language Model Transformers English
S
CarperAI
31
455
Ppo LunarLander V2
This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve the landing task in the LunarLander-v2 environment.
Physics Model
P
sb3
73
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase